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The nature of distributed computation has often been described in terms of the component op- 
erations of universal computation: information storage, transfer and modification. We introduce 
the first complete framework that quantifies each of these individual information dynamics on a 
local scale within a system, and describes the manner in which they interact to create non-trivial 
computation where "the whole is greater than the sum of the parts". We apply the framework to 
cellular automata, a simple yet powerful model of distributed computation. In this application, the 
framework is demonstrated to be the first to provide quantitative evidence for several important 
conjectures about distributed computation in cellular automata: that blinkers embody information 
storage, particles are information transfer agents, and particle collisions are information modifica- 
tion events. The framework is also used to investigate and contrast the computations conducted 
by several well-known cellular automata, highlighting the importance of information coherence in 
complex computation. Our results provide important quantitative insights into the fundamental 
nature of distributed computation and the dynamics of complex systems, as well as impetus for the 
framework to be applied to the analysis and design of other systems. 
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INTRODUCTION 



The nature of distributed computation has long been 
a topic of interest in complex systems science, physics, 
artificial life and bioinformatics. In particular, emergent 
complex behavior has often been described from the per- 
spective of computation within the system [ll, [3] and has 
been postulated to be associated with the capability to 
support universal computation [3j, J,, ^ . 

In all of these relevant fields, distributed computation 
is generally discussed in terms of "memory" , "commu- 
nication" , and "processing" . Memory refers to the stor- 
age of information by an agent or process to be used in 
its future. It has been investigated in coordinated mo- 
tion in modular robots [Q] , in the dynamics of inter-event 
distribution times 0], and in synchronization between 
coupled systems Communication refers to the trans- 
fer of information between one agent or process and an- 
other; it has been shown to be of relevance to biological 
systems (e.g. dipole-dipole interaction in microtubules 
[9|, and in signal transduction by calcium ions llCll ). so- 
cial animals (e.g. schooling behavior in fish and 
agent-based systems (e.g. the influence of agents over 
their environments |12l | , and in inducing emergent neural 
structure [l^l)- Processing refers to the combination of 
stored and/or transmitted information into a new form; 
it has been discussed in particular for biological neural 



networks and models thereof [Tj, [T^ [T4I (where it 
has been suggested as a potential biological driver), and 
also regarding collision-based computing fe.g. [la . [1^, 
and including soliton dynamics and collisions [20|). 

Significantly, these terms correspond to the compo- 
nent operations of Turing universal computation: infor- 
mation storage, information transfer (or transmission) 
and information modification. Yet despite the obvious 
importance of these information dynamics, we have no 
framework for either quantifying them individually or un- 
derstanding how they interact to give rise to distributed 
computation. Here, we present the first complete frame- 
work which quantifies each of the information dynam- 
ics or component operations of computation on a local 
scale and describes how they inter-relate to produce dis- 
tributed computation. Our focus on the local scale within 
the system is an important one. Several authors have 
suggested that a complex system is better characterized 
by studies of its local dynamics than by averaged or over- 
all measures (e.g. [21I, [22]), and indeed here we believe 
that quantifying and understanding distributed compu- 
tation will necessitate studying the information dynam- 
ics and their interplay on a local scale in space and time. 
Additionally, we suggest that the quantification of the in- 
dividual information dynamics of computation provides 
three axes of complexity within which to investigate and 
classify complex systems, allowing deeper insights into 
the variety of computation taking place in different sys- 
tems. 
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An important focus for discussions on the nature of dis- 
tributed computation have been cellular automata (CAs) 
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as model systems offering a range of dynamical behav- 
ior, including supporting complex computations and the 
ability to model complex systems in nature We se- 
lect CAs for experimentation here because there is very 
clear qualitative observation of emergent structures rep- 
resenting information storage, transfer and modification 
therein (e.g. [J, Q). CAs are a critical proving ground 
for any theory on the nature of distributed computation: 
significantly, Von Neumann was known to be a strong be- 
liever that "a general theory of computation in 'complex 
networks of automata' such as cellular automata would 
be essential both for understanding complex systems in 
nature and for designing artificial complex systems" (flj 
describing p3|). 

Information theory provides the logical platform for 
our investigation, and we begin with a summary of the 
main information-theoretic concepts required. We pro- 
vide additional background on the qualitative nature of 
distributed computation in CAs, highlighting the oppor- 
tunity for our framework to provide quantitative insights 
here. Subsequently, we consider each component opera- 
tion of universal computation in turn, and describe how 
to quantify it locally in a spatiotemporal system. As an 
application, we measure each of these information dy- 
namics at every point in space-time in several important 
CAs. Our framework provides the first complete quan- 
titative evidence for a well-known set of conjectures on 
the emergent structures dominating distributed compu- 
tation in CAs: that blinkers provide information storage, 
particles provide information transfer, and particle col- 
lisions facilitate information modification. Furthermore, 
our results imply that the coherence of information may 
be a defining feature of complex distributed computation. 
Our findings are significant because these emergent struc- 
tures of computation in CAs have known analogues in 
many physical systems (e.g. solitons and biological pat- 
tern formation processes) , and as such this work will con- 
tribute to our fundamental understanding of the nature 
of distributed computation and the dynamics of complex 
systems. 



II. INFORMATION-THEORETICAL 
PRELIMINARIES 

Information theory is an obvious tool for quantifying 
the information dynamics involved in distributed com- 
putation. In fact, information theory has already proven 
to be a useful framework for the design and analysis of 
complex self-organized systems Here, we will ex- 

tend this success to describing distributed computation 
in complex systems. 

We begin by reviewing several necessary information 
theoretic quantities (generally following the formulation 
in (25j). The fundamental quantity is the Shannon en- 
tropy, which represents the uncertainty associated with 
any measurement x of a random variable X (units in 



bits): 

Hx = -^pix)log2Pix). (1) 

X 

The joint entropy of two (or more) random variables 
X and F is a generalization to quantify the uncertainty 
of the joint distribution of X and Y: 

Hx,Y = -^p{x,y)\og2p{x,y). (2) 

The conditional entropy of X given Y is the average 
uncertainty that remains about x when y is known: 

Hx\Y = -^p{x,y)log2p{x\y). (3) 

x,y 

The mutual information between X and Y measures 
the average reduction in uncertainty about x that results 
from learning the value of y, or vice versa: 

/x;y=E^^(^'2^)l°g2 4W\- (4) 
^ p{x)p{y) 

Ix:Y ~ Hx — Hx\Y = Hy — Hy\x- (5) 

The conditional mutual information between X and 
Y given Z is the mutual information between X and Y 
when Z is known: 

Ix-Y\z = Hx\z ^ Hx\Y,Z (6) 
= Hy\z - Hy\x,z- (7) 

The entropy rate is the limiting value of the rate of 
change of the joint entropy over k consecutive states of 
X, (i.e. measurements x*^*^^ of the random variable X^'^^), 
as k increases [26l |: 

H,x - lim = hm H'^xik), (8) 

Kxik)-^. (9) 

The entropy rate can also be expressed as the limiting 
value of the conditional entropy of the next state of X 
(i.e. measurements Xn+i of the random variable X') 
given knowledge of the previous k states of X (i.e. mea- 
surements x^"^ , up to and including time step n, of the 
random variable X'^'^^): 

Hf^x = hm Hx'ixii') = lim H^x{k), (10) 

k — >oo k — >oo 

H^x{k) ~ Hx(k+i) — Hx(k)- (11) 

Grassberger [23| first noticed that a slow approach 
of the entropy rate to its limiting value was a sign of 
complexity. Formally, Crutchfield and Feldman [20| use 
the conditional entropy form of the entropy rate IjlOp (69j 
to observe that at a finite block size fc, the difference 
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H^x{k) — H^x represents the information carrying ca- 
pacity in size fc-blocks that is due to correlations. The 
sum over all k gives the total amount of structure in the 
system, quantified as excess entropy (measured in bits): 



ij.X\ 



(12) 



fc=0 



The excess entropy can also be formulated as the mu- 
tual information between the semi-infinite past and semi- 
infinite future of the system: 



lim / 



Jf(fc).X(fc+) ! 



(13) 



where X^^^^ is the random variable (with measurements 

Xn+i ) referring to the k future states of X (from time 
step n -\- 1 onwards). This interpretation is known as 
the predictive information psl ]. as it highlights that the 
excess entropy captures the information in a system's 
past which is relevant to predicting its future. 



III. CELLULAR AUTOMATA 
A. Introduction to Cellular Automata 

Cellular automata (CA) are discrete dynamical sys- 
tems consisting of an array of cells which each syn- 
chronously update their discrete state as a function of 
the states of a fixed number of spatially neighboring cells 
using a uniform rule. Although the behavior of each in- 
dividual cell is very simple, the (non-linear) interactions 
between all cells can lead to very intricate global behav- 
ior meaning CAs have become a classic example of self- 
organized complex behavior. Of particular importance, 
CAs have been used to model real-world spatial dynam- 
ical processes, including fluid flow, earthquakes and bio- 
logical pattern formation ^ . 

The neighborhood of a cell used as inputs to its update 
rule at each time step is usually some regular configura- 
tion. In ID CAs, this means the same range r of cells on 
each side and including the current state of the updating 
cell. One of the simplest variety of CAs - ID CAs using 
binary states, deterministic rules and one neighbor on ei- 
ther side - are known as the Elementary CAs, or EGAs. 
Example evolutions of ECAs from random initial condi- 
tions may be seen in Fig. 2(a) and Fig. |6(a) For more 
complete definitions of CAs, including the definition of 
the Wolfram rule number convention for specifying up- 
date rules, see i29il. 

Wolfram [1, |29| | sought to classify the asymptotic 
behavior of CA rules into four classes: I. Homoge- 
neous state; II. Simple stable or periodic structures; III. 
Chaotic aperiodic behavior; IV. Complicated localized 
structures, some propagating. Much conjecture remains 
as to whether these classes are quantitatively distinguish- 
able (e.g. see [lOl), however they do provide an interest- 
ing analogy (for discrete state and time) to our knowledge 



of dynamical systems, with classes I and II representing 
ordered behavior, class III representing chaotic behavior, 
and class IV representing complex behavior and consid- 
ered as lying between the ordered and chaotic classes. 

More importantly, the approach seeks to character- 
ize complex behavior in terms of emergent structure in 
CAs, surrounding gliders, particles and domains. Quali- 
tatively, a domain may described as a set of background 
configurations in a CA, for which any given configura- 
tion will update to another such configuration in the set 
in the absence of any disturbance. Domains are formally 
defined within the framework of computational mechan- 
ics d^l as spatial process languages in the CA. Particles 
are qualitatively considered to be moving elements of co- 
herent spatiotemporal structure. Gliders are particles 
which repeat periodically in time while moving spatially 
(repetitive non-moving structures are known as blinkers). 
Formally, particles are defined within the framework of 
computational mechanics as a boundary between two do- 
mains [12]; as such, they can also be termed as domain 
walls, though this is typically used with reference to ape- 
riodic particles. 

These emergent structures are more clearly visible 
when the CA is filtered in some way, using for example e- 
machines [l^] , input entropy Isil. local information [3^ , 
or local statistical complexity [2l|. All of these filtering 
techniques produce a single filtered view of the structures 
in the CA: our measures of local information dynamics 
will present several filtered views of the distributed com- 
putation in a CA. The ECA examples analyzed in this 
paper are introduced in Section [III CI 



B. Computation in Cellular Automata 

CAs can be interpreted as undertaking distributed 
computation: it seems fairly clear that "data represented 
by initial configurations is processed by time evolution" 
[4|. As such, computation in CAs has been a popular 
topic for study (see jlj), with a particular focus in ob- 
serving or constructing (Turing) universal computation 
in certain CAs. An ability for universal computation is 
defined to be where "suitable initial configurations can 
specify arbitrary algorithm procedures" in the computing 
entity, which is capable of "evaluating any (computable) 
function" Q. Wolfram conjectured that all class IV com- 
plex CAs were capable of universal computation d, [ll] . 
He went on to state that prediction in systems exhibiting 
universal computation is limited to explicit simulation of 
the system, as opposed to the availability of any simple 
formula or "short-cut" , drawing parallels to the halting 
problem for universal Turing machines 0, [s^ which are 
echoed by Langton 3] and Casti Q . (Casti extended the 
analogy to undecidable statements in formal systems, i.e. 
Godel's Theorem). The capability for universal compu- 
tation has been proven for several CA rules, through the 
design of rules generating elements to (or by identifying 
elements which) specifically provide the component op- 
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erations required for universal computation: information 
storage, transmission and modification. Examples here 
include most notably the Game of Life [3J| and EGA rule 
110 dl]; also see [3g| and discussions iii[l|. 

The focus on elements providing information storage, 
transmission and modification pervades discussion of all 
types of computation in CAs (e.g. also see [H, [svj). 
Wolfram claimed that in class III CAs information prop- 
agates over an infinite distance at a finite speed, while 
in class IV CAs information propagates irregularly over 
an infinite range [33]. Langton [3| hypothesized that 
complex behavior in CAs exhibited the three component 
operations required for universal computation. He sug- 
gested that the more chaotic a system becomes the more 
information transmission increases, and the more ordered 
a system becomes the more information it stores. Com- 
plex behavior was said to occur at a phase transition 
between these extremes requiring an intermediate level 
of both information storage and transmission: if infor- 
mation propagates too well, coherent information decays 
into noise. Langton elaborates that transmission of infor- 
mation means that the "dynamics must provide for the 
propagation of information in the form of signals over ar- 
bitrarily long distances" , and suggests that particles in 
CAs form the basis of these signals. To complete the 
qualitative identification of the elements of computation 
in CAs, he also suggested that blinkers formed the basis 
of information storage, and collisions between propagat- 
ing (particles) and static structures (blinkers) "can mod- 
ify either stored or transmitted information in the sup- 
port of an overall computation" . Rudimentary attempts 
were made at quantifying the average information trans- 
fer (and to some extent information storage) , via mutual 
information (although as discussed later this is a sym- 
metric measure not capturing directional transfer). Rec- 
ognizing the importance of the emergent structures to 
computation, several examples exist of attempts to au- 
tomatically identify CA rules which give rise to particles 
and gliders, e.g. [HI, [11], suggesting these to be the most 
interesting and complex CA rules. 

Several authors however criticize the aforementioned 
approaches of attempting to classify GAs in terms of 
their generic behavior or "bulk statistical properties" , 
suggesting that the wide range of differing dynamics tak- 
ing place across the G A makes this problematic [l], [l^ . 
Gray suggests that there there may indeed be classes of 
CAs capable of more complex computation than univer- 
sal computation alone [3^. More importantly, Hanson 
and Crutchfield [55] criticize the focus on universal com- 
putational ability as drawing away from the ability to 
identify "generic computational properties", i.e. a lack 
of ability for universal computation does not mean a CA 
is not undertaking any computation at all. Alternatively, 
these studies suggest that analyzing the rich space-time 
dynamics within the CA is a more appropriate focus. 
As such, these and other studies have analyzed the lo- 
cal dynamics of intrinsic or other specific computation, 
focusing on particles facilitating the transfer of informa- 



tion and collisions facilitating the information processing. 
Noteworthy examples here include: the method of apply- 
ing filters from the domain of computational mechanics 
by Hanson and Crutchfield [53]; and analysis using such 
computational mechanics filters of CA rules selected via 
evolutionary computation to perform classification tasks 
by Mitchell et al [3^, [40|. Related are studies which 
deeply investigate the nature of particles and their in- 
teractions (e.g. particle types and their interaction prod- 
ucts identified for particular CAs in [iol lilllip. and rules 
established for their interaction products in [431]). 

Despite such interest, there is no complete framework 
that locally quantifies the individual information dynam- 
ics of distributed computation within CAs or other sys- 
tems. In this study, we outline how the information dy- 
namics can be locally quantified within the spatiotem- 
poral structure of a CA. In particular, we describe the 
dynamics of how information storage and information 
transfer interact to give rise to information processing. 
Our approach is not to quantify computation or overall 
complexity, nor to identify universal computation or de- 
termine what is being computed; it is simply intended to 
quantify the component operations in space-time. 



C. Examples of distributed computation in CAs 

In this paper, we will examine the computation carried 
out by several important EC A rules: 



Class IV complex rules 110 and 54 [29*] (see Fig. |4(a)| 



and Fig. 2(a)), both of which exhibit a number of 
glider types and collisions. ECA rule 110 is the only 
proven computationally universal ECA rule [35| . 

• Rules 22 and 30 as representative class III chaotic 
rules [2^ (see rule 22 in Fig. 7(a)); 



• Rules 18 as a class III rule which contains domain 
walls against a chaotic background domain [l^, [3] ■ 

These CAs each carry out an intrinsic computation of 
the evolution to their ultimate attractor and phase on it 
(see 3i] for a discussion of attractors and state space in 
finite-sized CAs). 

We also examine a CA carrying out a "human- 
understandable" computational task. (jipar is a ID 
CA with range r = 3 (Wolfram rule number 
0xfeedffdeclaaeec0eef000a0ela020a0) that was evolved 
by Mitchell et al [s^, [4^] to classify whether the initial 
CA configuration had a majority of I's or O's by reaching 
a fixed-point configuration of all I's for the former or all 
O's for the latter. This CA rule achieved a success rate 
above 70% in its task. An example evolution of this CA 
can be seen in Fig. 



5(a) 



The CA appears to carry out 
this computation using blinkers and domains for informa- 
tion storage, gliders for information transfer and glider 
collisions for information modification. The CA exhibits 
an initial emergence of domain regions of all I's or all O's 
storing information about local high densities of either 
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state. Where these domains meet, a checkerboard do- 
main propagates slowly (1 cell per time step) in both di- 
rections, transferring information of a soft uncertainty in 
this part of the CA. Some "certainty" is provided where 
the checkerboard encounters a blinker boundary between 
and 1 domains, which stores information about a hard 
uncertainty in that region of the CA. This results in an 
information modification event where the domain on the 
opposite side of the blinker to the incoming checkerboard 
is concluded to represent the higher density state, and is 
allowed to propagate over the checkerboard. Because of 
the greater certainty attached to this decision, this new 
information transfer occurs at a faster speed (3 cells per 
time step); it can overrun checkerboard regions, and in 
fact collisions of opposing types of this strong propaga- 
tion give rise to the (hard uncertainty) blinker boundaries 
in the first place. The final configuration is therefore the 
result of this distributed computation. 

Quantification of the local information dynamics via 
these three axes of complexity (information storage, 
transfer and modification) will provide quite detailed in- 
sights into the distributed computation carried out in a 
system. In all of these CAs we expect local measures 
of information storage to highlight blinkers and domain 
regions, local measures of information transfer to high- 
light particles (including gliders and domain walls), and 
local measures of information modification to highlight 
particle collisions. 

This will provide a deeper understanding of computa- 
tion than single or generic measures of bulk statistical 
behavior, from which confiict often arises in attempts to 
provide classification of complex behavior. In particular, 
we seek clarification on the long-standing debate regard- 
ing the nature of computation in EGA rule 22. Sugges- 
tions that rule 22 is complex include the difficulty in es- 
timating the metric entropy (i.e. temporal entropy rate) 
for rule 22 in [27], due to "complex long-range effects, 
similar to a critical phenomenon" ^iE\. This effectively 
corresponds to an implication that rule 22 has contains 
an infinite amount of memory (see Section fl V A |) . Also, 
from an initial condition of only a single "on" cell, rule 
22 forms a pattern known as the "Sierpinski Gasket" [2^ 
which exhibits clear self-similar structure. Furthermore, 
rule 22 is a ID mapping of the 2D Game of Life GA 
(known to have the capability for universal computation 
[34| ) and in this sense is referred to as "life in one dimen- 
sion" [4^, and complex structure in the language gener- 
ated by iterations of rule 22 has been identified [43| . Also, 
we report here that we have investigated the Ci complex- 
ity measure [48j (an enhanced version of the variance of 
the input entropy [Slj) for all EGAs, and found rule 22 to 
clearly exhibit the largest value of this metric (0.78 bits 
to rule llO's 0.085 bits). On the other hand, suggestions 
that rule 22 is not complex include its high sensitivity 
to initial conditions leading to Wolfram classifying it as 
class III chaotic [2^. Gutowitz and Domain claim 
this renders it as chaotic despite the subtle long-range 
effects it displays, further identifying its fast statistical 



convergence, and exponentially long and thin transients 
in state space (see [31|). Importantly, no coherent struc- 
ture (particles, collisions, etc.) is found for rule 22 using 
a number of known filters for such structure (e.g. local 
statistical complexity [21,]): this reflects the paradigm 
shift to an examination of local dynamics rather than 
generic, overall or averaged analysis. In our approach, 
we seek to combine this local viewpoint of the dynamics 
with a quantitative breakdown of the individual elements 
of computation, and will investigate rule 22 in this light. 

IV. INFORMATION STORAGE 

In this section we outline methods to quantify informa- 
tion storage. We describe how total information storage 
is captured by excess entropy, and introduce active in- 
formation storage to capture the amount of information 
storage that is currently in use. We present the first 
application of local profiles of both measures to cellular 
automata. 



A. Excess entropy as total information storage 

Although discussion of information storage or memory 
in GAs has often focused on periodic structures (par- 
ticularly in construction of universal Turing machines), 
information storage does not necessarily entail periodic- 
ity. The excess entropy (Eq. p2|13p ) more broadly en- 
compasses all types of structure and memory by captur- 
ing correlations across all lengths of time, including non- 
linear correlations. It is quite clear from the predictive 
information formulation of the excess entropy Eq. (|13p - 
as the information from a system's past that is relevant 
to predicting its future - that it is a measure of the total 
information storage in a system. 

We use the term single-agent excess entropy to refer 
to measuring the excess entropy for individual agents 
or cells using their one-dimensional time series of states. 
This is a measure of the average memory for each agent. 
Furthermore, we use the term collective excess entropy to 
refer to measuring the temporal excess entropy for a col- 
lective of agents (e.g. a set of neighboring cells in a GA) 
using their two-dimensional time series of states. Gonsid- 
ered as the mutual information between their joint past 
and future (i.e. a joint temporal predictive information), 
this is a measure of the average total memory stored in 
the collective (i.e. stored collectively by a set of cells in a 
CA). GoUective excess entropy could be used for exam- 
ple to quantify the "undiscovered collective memory that 
may present in certain fish schools" [T]| . 

Grassberger studied temporal entropy rate estimates 
for several EGAs in [23, in order to gain insights into 
their excess entropies. These studies estimated temporal 
entropy rates iJ^_Ar(fc) for spatial blocks of size iV as is 
increased. Estimated values of H^ j^{k) (for = 1 and 
in the limit as A^ ^ 00) were cataloged for most EGAs 
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in a table of statistical properties in [5C|. Using these 
estimates, the studies focused on inferring the collective 
excess entropies rather than the single-agent excess en- 
tropies (iV = 1). For several rules (including rule 22, 
studied with Monte Carlo estimates), the temporal en- 
tropy rate estimates H^^N{k) (for > 1) were concluded 
to follow a power law decay to their asymptote TJ^^at: 



H^^N{k) = H^,N + C/k^ 



(14) 



with exponent a < 1 (C is a constant). This is signif- 
icant because with a < 1 the collective excess entropy 
(known as effective measure complexity in (27j ) is diver- 
gent, implying a highly complex process. This case has 
been described as "a phenomenon which can occur in 
more complex environments" , as with strong long-range 
correlations a semi-infinite sequence "could store an infi- 
nite amount of information about its continuation" [sH 
(as per the predictive information form of the excess en- 
tropy Eq. ([13])). Rule 22 was inferred to have H^^m = 
and infinite excess entropy, which can be interpreted as a 
process requiring an infinite amount of memory to main- 
tain an aperiodicity (26j . Alternative methods for com- 
puting two-dimensional excess entropies, which would be 
applicable for computing the collective excess entropy in 
CAs, were presented by Feldman and Crutchfield in ^2]. 

In attempting to quantify local information dynamics 
of distributed computation here, our focus is on infor- 
mation storage for single agents or cells rather than the 
joint information storage across the collective. Were such 
power-law trends to exist for the single-agent case, they 
may be more significant than for the collective case: di- 
vergent collective excess entropy implies that the collec- 
tive is at least trivially utilizing all of its available mem- 
ory (and even the chaotic rule 30 exhibits this), whereas 
divergent single-agent excess entropy implies that all 
agents are individually highly utilizing the resources of 
the collective in a highly complex process. One could go 
on to study the entropy rate convergence for single agents 
(iV = l)[7Qj, however any findings would be subject to 
the problems with overall or averaged measures described 
earlier. Again, we emphasize that our focus is on local 
measures in time as well as space, which we present in 
the next section. 

First though we note that with respect to CAs, where 
each cell has only a finite number of states h and takes 
direct influence from only its single past state and the 
states of a finite number of neighbors, the meaning of (ei- 
ther average or local) information storage being greater 
log2 h bits (let alone infinite) in the state "process" of a 
single cell is not immediately obvious. Clearly, a cell in 
an ECA cannot store more than 1 bit of information in 
isolation. However, the bidirectional communication in 
CAs effectively allows a cell to store extra information in 
neighbors (even beyond the immediate neighbors), and to 
subsequently retrieve that information from those neigh- 
bors at a later point in time. While measurement of 
the excess entropy does not explicitly look for such self- 
influence communicated through neighbors, it is indeed 



the method by which a significant portion of information 
is channeled. Considering the predictive information in- 
terpretation in Eq. , it is easy to picture self-influence 
between semi-infinite past and future blocks being con- 
veyed via neighbors (see Fig. 1(a)). This is akin to the 
use of stigmergy (indirect communication through the 
environment, e.g. see [53^) to communicate with oneself. 

A measurement of more than log2 h bits stored by a cell 
on average, or indeed an infinite information storage, is 
then a perfectly valid result: in an infinite CA, each cell 
has access to an infinite amount of neighbors in which to 
store information which can later be used to infiuence its 
own future. Note however, that since the storage medium 
is shared by all cells, one should not think about the total 
memory as the total number of cells multiplied by this 
average. The total memory would be properly measured 
by the collective excess entropy, which takes into account 
the inherent redundancy here. 



B. Local excess entropy 

As discussed previously, we now shift focus to local 
measures of information storage, which have the poten- 
tial to provide more detailed insights into information 
storage structures and their involvement in computation 
than single ensemble measures. 

The local excess entropy is a measure of how much 
information a given agent is currently storing at a par- 
ticular point in time. To derive it, note that the excess 
entropy of a process is actually the expectation value of 
the local excess entropy for the process at every time step 
[131 -[zll The local excess entropy ex{n-\-\) of a process is 
simply the log term from inside the mutual information 
expansion (as per Eq. ([4])) of the predictive information 
formulation of excess entropy in Eq. (|13p . evaluated for 
the semi-infinite past and future at the given time step 
n + 1: 



p{x 



(fe) Ak+) 



ex{n + 1) = hm log^ (,) , ■ 

p{x\')p{x\_^{) 



(15) 



Note that the excess entropy is the average of the lo- 
cal values, Ex = (ejf(n)), and that by convention we 
use lower-case symbols to denote local values. The limit 
fc — > 00 is an important part of this definition, since cor- 
relations at all time scales should be included in the com- 
putation of information storage. Since this is not com- 
putationally feasible in general, we retain the notation 
ex{n + 1, /c) to denote finite-fc estimates of ex(" + 1)- 

The notation is generalized for lattice systems (such as 
CAs) with spatially- ordered agents to represent the local 
excess entropy for cell Xi at time n -f 1 as: 



e(i, n + 1) = lim log2 



( (fc) (fc+) \ 



k — *oo 



(fc+) 



(16) 
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(a) Excess Entropy 



(b)Active Information Storage 



(a) Excess Entropy: total information from the 



FIG. 1: Measures of single-agent information stora ge in distributed systems, 
cell's past that is relevant to predicting its future. 
in use in determining the next state of the cell. The stored information can be conveyed directly through the cell itself or via 
neighboring cells. 



(b) Active Information storage: the information storage that is currently 



Again, e{i,n+l,k) is used to denote finite-fc estimates of 
e(i, n + 1). Local excess entropy is defined for every spa- 
tiotemporal point (i, n) in the system. (Alternatively, the 
collective excess entropy can only be localized in time). 

While the average excess entropy is always positive, the 
local excess entropy may in fact be positive or negative, 
meaning the past history of the cell can either positively 
inform us or actually misinform us about its future. An 
observer is misinformed where the semi-infinite past and 
future are relatively unlikely to be observed together as 
compared to their independent likelihoods. 



C. Active information storage 

The excess entropy measures the total stored informa- 
tion which will be used at some point in the future of 
the state process of an agent, possibly but not necessar- 
ily at the next time step n -\- 1. In examining the local 
information dynamics of computation, we are interested 
in how much of the stored information is actually in use 
at the next time step. As we will see in Section IVll this 
is particularly important in understanding how stored 
information interacts with information transfer in infor- 
mation processing. As such, we derive active information 
storage Ax as the average mutual information between 
the semi-infinite past of the process and its next state, 
as opposed to its whole (semi-infinite) future: 



at a particular time-step n + 1: 

ax{n + 1)^ hm log2 -, (18) 

and we have Ax = {ax{n)). We retain the notation 
ax{n + 1, k) for finite-fc estimates. Again, we generalize 
the measure for agent Xi in a lattice system as: 

a{i,n+l) = hm log^ ' -, (19) 

p(<j)p(x,,„+i) 

and use a{i,n + l,k) to denote finite-fc estimates there, 
noting that the local active information storage is defined 
for every spatiotemporal point (i, n) in the lattice system. 

The average active information storage will always be 
positive (as for the excess entropy) , but is bounded above 
by log2 b bits where the agent only takes b discrete states. 
The local active information storage is not bound in this 
manner however, with values larger than log2 b indicating 
that the particular past of an agent provides strong pos- 
itive information about its next state. Furthermore, the 
local active information storage can be negative, where 
the past history of the agent is actually misinformative 
about its next state. An observer is misinformed where 
the past history and observed next state are relatively 
unlikely to occur together as compared to their separate 
occurrence. 



Ax = hm /(aC^);^'). 

k^oc 



(17) 



D. Local information storage results 



The local active information storage is then a measure of 
the amount of information storage in use by the process 



In order to evaluate the local measures within sample 
CA runs, we estimate the required probability distribu- 
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tion functions from CA runs of 10 000 cells, initialized 
from random states, with 600 time steps retained (after 
the first 30 time steps were eliminated to allow the CA 
to settle). Alternatively, for cfipar we used 1000 cells with 
1000 time steps retained. Periodic boundary conditions 
were used. Observations taken at every spatiotemporal 
point in the CA were used in estimating the required 
probability distribution functions, since the cells in the 
CA are homogeneous agents. All results were confirmed 
by at least 10 runs from different initial states. 

We make estimates of the measures with finite values 
of fc, noting that the insights described here could not 
be attained unless a reasonably large value of k was used 
in order to capture a large proportion of the correlations. 
Determination of an appropriate value of k was discussed 
in [5^ for the related transfer entropy measure, presented 
in Section |Vl As a rule of thumb, k should at least be 
larger than the period of any regular background domain 
in order to capture the information storage underpinning 
its continuation. 

We begin by examining the results for rules 54 and 
110, which contain regular ghders against periodic back- 
ground domains 
and Fig 



4(a) 



For the CA evolutions in Fig. 2(a) 
the local profiles of e{i,n,k — 8) gener- 
ayed in Fig. 2(b) and Fig. |4(b)| (positive 
and the local profiles of a(i,n,k = 16) in 

It is quite 



ated are disp 
values only). 

Fig. 2(c) and Fig. 4(c) (positive values only) 



clear that positive information storage is concentrated in 
the vertical gliders or blinkers, and the domain regions. 
As expected, these results provide quantitative evidence 
that the blinkers are the dominant information storage 
entities. That the domain regions contain significant in- 
formation storage should not be surprising, since as a 
periodic sequence its past does indeed store information 
about its future. In fact, the local values for each mea- 
sure form spatially and temporally periodic patterns in 
the domains, due to the spatial and temporal periodici- 
ties exhibited there. While the local active information 
storage indicates a similar amount of stored information 
in use to compute each space-time point in both the do- 
main and blinker areas, the local excess entropy reveals a 
larger total amount of information is stored in the blink- 
ers. For the blinkers known as a and /3 in rule 54 [i^ this 
is because the temporal sequences of the center columns 
of the blinkers (0-0-0-1, with e{i,n,k = 8) in the range 
5.01 to 5.32 bits) are more complex than those in the 
domain (0-0-1-1 and 0-1, with e{i,n,k = 8) in the range 
1.94 to 3.22 bits), even where they are of the same pe- 
riod. We have e{i, n, k = 8) > 1 bit here due to the dis- 
tributed information storage supported by bidirectional 
communication (as discussed earlier): this also supports 
the pcriod-7 domain in rule 110. Another area of strong 
information storage appears to be the "wake" of the more 
complex gliders in rule 110 (see the glider at top left of 
Fig. |4(b)| and Fig. |4(c)j). This result ahgns weU with 
our observation in 55]that the dynamics following the 
leading edge of regular gliders consists largely of "non- 
traveling" information. The presence of the information 



storage is shown by both measures, although the relative 
strength of the total information storage is again revealed 
only by the local excess entropy. 

Negative values of a(i,n, k = 16) for rules 54 and 110 
are displayed in Fig. |2(d)| and Fig. |4(d)[ Interestingly, 
negative local components of local active information 
storage measure are concentrated in the traveling glider 
areas (e.g. 7'*" and 7" for rule 54 ^43] ) , providing a good 
spatiotemporal filter the glider structure. This is because 
when a traveling glider is encountered at a given cell, the 
past history of that cell (being part of the background do- 
main) is misinformative about the next state, since the 
domain sequence was more likely to continue than be in- 
terrupted. For example, see the marked positions of the 
7 gliders in Fig. [3] There we have p{xn+i\xit~^^^ ) = 0.25 
and p{xn+i) = 0.52: since the next state occurs relatively 
infrequently after the given history, we have a misinfor- 
mative a{n,k = 16) = —1.09 bits. This is juxtaposed 
with the points four time steps before those marked "x" , 
which have the same history xi'^"^^'' but are part of the 

domain, with p{xn+i\x^~^^^ ) = 0.75 and p{xn+i) = 0.48 
giving a(n, k = 16) = 0.66 bits, quantifying the positive 
information storage there. Note that the points with mis- 
informative information storage are not necessarily those 
selected by other filtering techniques as part of the glid- 
ers: e.g. the finite state transducers technique (using 
left to right scanning by convention) [56| would identify 
points 3 cells to the right of those marked "x" as part of 
the 7+ glider. 

The local excess entropy produced some negative val- 
ues around traveling gliders (results not shown), though 
these were far less localized on the gliders themselves 
and less consistent in occurrence than for the local ac- 
tive information storage. This is because the local excess 
entropy, as measure of total information storage into the 
future, is more loosely tied to the dynamics at the given 
spatiotemporal point. The effect of a glider encounter 
on e(z, n, k) is smeared out in time, and in fact the dy- 
namics may store more positive information in total than 
the misinformation encountered at the specific location 
of the glider. (For example, glider pairs in Fig. |4(b)| 
have positive total information storage, since a glider 
encounter becomes much more likely in the wake of a 
previous glider). 

As another rule containing regular gliders against a pe- 
riodic background domain, analysis of the raw states of 
(j>par in Fig.|5(a) provides similar results for e{i,n,k = 5) 
in Fig. |5(b)| and a(z, n,k = 10) in Fig. 5(c) and Fig. |5(d)| 
here. One distinction is that the blinker here contains 
no more stored information than the domain, since it is 
no more complicated. Importantly, we confirm the infor- 
mation storage capability of the blinkers and domains in 
this human understandable computation. 

Another interesting example is provided by EGA rule 
18, which contains domain walls against a seemingly ir- 
regular background domain. We measured the local val- 
ues for e(z,n, fc — 8) and a{i,n,k = 16) (see Fig. |6(b)] 



Fig. 6(d) and Fig. 6(e) I for the raw states of rule 18 dis- 
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(a)Raw CA 



(b)e(i, n,k = 8) 



(c)a(i, n,k = 16) : +ve 
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(d)a(j, n, A; = 16) : —ve 




(c)t(i, j 


= I, n, k = 16) : +ve 


(f)t(i,j = l,n,k = 16) : —ve 




(g)s(j, n,k = 16) : +ve 



(h)s(i, n,k = 16) : —ve 



(i)Collision points 



FIG. 2: Local information dynamics in rule 54: (35 time steps displayed for 35 cells, time increases down the page for all 
CA plots): 1(b) I Local excess entropy, positive val ues o nly, (all figures gray-scale with 16 levels) with max. 11.79 bits (black) 
min. 0.00 bits (white); Local active information: 
negative values only, max. 0.00 bits (white), min 



positive values only, max. 1.07 bits (black), min. 0.00 bits (white) 
12.27 bits (black); Local apparent transfer entropy (one cell to the rig' 
(e) positive values only, max. 7.93 bits (black), min. 0.00 bits (white), (f) negative values only, max. 0.00 bits (white), min 

0.00 bits (white 



-4.04 bits (black); Local separable information: 
negative values only, max. 0.00 bits (white), min 
local collective transfer entropy profile t{i,n, k = 16). 



(d) 

hty 



(g) positive values only, max. 8.40 bits (black), min. 



(h) 



5.27 bits (black); (i) Positions of s{i,n,k — 16) < marked against the 
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FIG. 3: Close up of raw states of rule 54. "x" and "+" 
mark some positions in the 7"*" and 7" gliders respectively. 
Note their point of coincidence in collision type "A" , with "o" 
marking the subsequent non-trivial information modification 
as detected using s{i, n,k = 16) < 0. 



played in Fig. 6(a) Importantly, the most significant 
negative com pone nts of the local active information stor- 
age (see Fig. |6(e) I are concentrated on the domain walls 



(in particular, where the domain walls are not station- 
ary): analogous to the regular gliders of rule 54, when 
a domain wall is encountered the past history of the cell 
becomes misinformative about its next state. Again, the 
negative components of a{i,n,k — 16) appear to be a 
good filter for moving spatiotemporal structure. Inter- 
estingly, in contrast to rules 54 and 110, the background 
domain for rule 18 contains both a significant number 
of points with negative and with positive (Fig. |6(d)[ ) lo- 
cal active information storage. Considering these compo- 
nents together, we observe a pattern to the background 
domain of spatial and temporal period 2 corresponding to 
the period-2 e-machine generated to recognize the back- 
ground domain for EGA rule 18 by Hanson and Crutch- 
field |22] . Every second site in the domain is a "0" , and 
contains a small positive a{i, n,k = 16) (0.43 to 0.47 bits, 
by limited inspection); information storage of this pri- 
mary temporal phase of the period is enough to predict 
the next state here. The alternate site is either a "0" or 
a "1" , and contains either a small negative a(i, n,k — 16) 
at the "0" sites (-0.45 to -0.61 bits, by limited inspection) 
or a larger positive a{i,n,k = 16) at the "1" sites (0.98 
to 1.09 bits, by limited inspection). Information stor- 
age of this alternate temporal phase is strongly in use 
or active in computing the "1" sites since the "1" sites 
only occur in the alternate phase. However, the infor- 
mation storage indicating the alternate temporal phase 
is misleading in computing the "0" sites since they occur 
more frequently with the primary phase. Domain walls 
are points where the spatiotemporal domain pattern is 
violated, with strong negative components of the local 



active information storage revealing the traveling points 
of this violation structure. 

The local excess entropy profile on the other hand 
contains both positive (Fig. |6(b)[ ) and negative values 
(Fig. 6(c) ) for the domain walls. As per the results for 
gliders, these negative values are less specifically localized 
on the domain walls than observed for a{i, n, k). Negative 
values of e{i, n,k = 8) can similarly be understood as the 
encountering of a moving domain wall rendering the past 
misinformative regarding the future dynamics. Strong 
positive values of e(z, n,k — 8) however are observed to 
occur where the domain wall makes several changes of 
direction during the k steps but is somewhat stationary 
on average: again, this result is similar to pairs of reg- 
ular gliders, i.e. a domain wall encounter is much more 
likely in the wake of previous domain wall movement than 
elsewhere in the CA. In theory, the background domain 
should contain a consistent level of excess entropy at 1 bit 
to store the temporal phase information, and this occurs 
for most points (the exceptions are where long temporal 
chains of O's occur, disturbing the memory of the phase 
due to finite-Zc effects). Again, this resembles a smear- 
ing out of the local periodicity of the active information 
storage. 

Finally, we examine EGA rule 22, suggested to have 
infinite collective excess entropy [27l . [isl b ut without any 
known coherent structural element s |2l| . For the raw 
states of rule 22 displayed in Fig. 7(a) the calculated 
local excess entropy profile is shown in Fig. |7(b)| (posi- 
tive components only), and the local active information 
storage profile in Fig. 7(c) (positive components) and 
Fig. |7(d)] (negative components). While information stor- 
age is certainly observed to occur for rule 22, these plots 
provide evidence that there is no coherent structure to 
this storage. This is another clear example of the utility 
of examining local information dynamics over ensemble 
estimates, given the earlier discussion on collective excess 
entropy for rule 22. 

In summary, we have demonstrated that the local ac- 
tive information storage and local excess entropy provide 
insights into information storage dynamics that, while 
often similar in general, are sometimes subtly different. 
While both measures provide useful insights, the local 
active information storage is the most useful in a real- 
time sense, since calculation of the local excess entropy 
requires knowledge of the dynamics an arbitrary distance 
into the future. [T^] Furthermore, it also provides the most 
specifically localized insights, including filtering moving 
elements of coherent spatiotemporal structure. This be- 
ing said, it is not capable of identifying the information 
source of these structures; for this, we turn our attention 
to a specific measure of information transfer. 



V. INFORMATION TRANSFER 

Information transfer refers to a directional signal or 
communication of dynamic information from a source 
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{d)a{i, n,k = 16) : —ve 


{e)t{i, j = —1, n^k = 16) : -j-ve 


(f )£(2, j = — 1, n, k = 16) : —ve 




{g)hfj.{i,n,k 



{h)s{i, n,k = 16) : +ve 



(i)s(i, n, 



(b) Local excess entropy, positive values 
1.22 bits (black), 



FIG. 4: Local information dynamics in rule 110: (55 time steps displayed for 55 cells); 

only, max. 10.01 bits ( blac k), min. 0.00 bits (white); Local active information; |(c)| positive values only, max. 
min. 0.00 bits (white), (d) negative values only, max. 0.00 bits (white), min. -9.21 bits (black); Local apparent transfer entropy 
(one cell to the left); (ej positive values only, max. 10.43 bits (black), min. 0.00 bits (white), (f) negative values only, max. 

bits (black); (g) Local temporal entropy rate, max. 10.43 bits (black), min. 0.00 bits (white); 
(h) positive values only, max. 5.47 bits (black), min. 0.00 bits (white), (i) negative values only. 



0.00 bits (white), min. -6.0 
Local separable information: 
max. 0.00 bits (white), min. 



-5.20 bits (black). 
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(a)Raw CA 



(b)e(i, n,k = 5) 



(c)a(i, n,k = 10) : +ve 




(d)a(i, n,k = 10) : —ve 



{c)t{i, j = l,n,k = 10) : +ve 



{f)t{i,j = 3,n,k = 10) : +ve 






{g)h^{i,n,k = 10) 



(h)s(i, n,k = 10) : +ve 



{i)s{i,n,k = 10) 



3 rule (ppar'- (86 time steps displayed for 86 cells) 
Local active information 
0.00 bits (white), (d) negative values only, max. 0.00 bits (white) 



FIG. 5: Local information dynamics in r 
values only, max. 11.76 bits (black ), m 
bits (black), min 
transfer entropy: 



(b) Local excess entropy, positive 

1.52 



(e) one cell to the right, positive values only, max 
9.24 bits (black), min 



the right, positive values only, max 

(black), min. 0.00 bits (white); Local separable information: 
(white), I (i) I negative values only, max. 0.00 bits (white), min 



0.00 bits (white) 

mm. 

10.45 bi ts (b lack), min 
0. 00 b its (white) ; [(g) | Local temporal entropy rate, max 
(h) positive values only, max. 29.26 bits (black) 



positive values only, max 
9.41 bits (black); Local apparent 
0.00 bits (white), [(f)] three ceU to 
10.92 bits 
min. 0.00 bits 



-18.68 bits (black). 
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(a)Raw CA 



(b)e(i, n, fc = 8) : +ve 



(c)e(«, n, k = 8) : —ve 






(d)a(i, n,k = 16) : +ve 



(e)a(i, n,k = 16) : —ve 



{f)t{i,j = l,n,k = 16) : +ve 



\ 





{g)h^{i, n,k = 16) 



(h)s(i, n,k = 16) ; +i;e 



(i)s(i, n,k = 16) : —tie 



FIG. 6: Local information dynamics in rule 18: (67 time steps displayed for 67 cells ): Local excess entropy: (c) negative values 
only, max. 0.00 bits (black), min. -8.65 bits (black); Local active information: (d) positive values only, max. 1.98 bits (black), 
min. 0.00 bits (white), (e) negative values only, max. 0.00 bits (white), min. -9.92 bits (black); (f) Local apparent transfer 



entropy (one cell to the right), positive values only, max. 11.90 bits (black), min. 0.00 bits (white) ; | (g) | Local temporal entropy 
rate, max. 11.90 bits (black), min. 0.00 bits (white); Local separable information: 
(black), min. 0.00 bits (white), (i) negative values only, max. 0.00 bits (white), min. -14.37 bits (black). 



(h) positive values only, max. 1.98 bits 
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(one cell to the right): 
0.00 bits (white), min. 



(b) Local excess entropy, positive values 
(c) positive values only, max. 1.51 bits (black). 



FIG. 7: Local information dynamics in rule 22; (67 time steps displayed for 67 cells): 
only, max. 4.49 bits ( black ), min. 0.00 bits (white); Local active information: 

min. 0.00 bits (white), (d) negative values only, max. 0.00 bits (white), min. -8.17 bits (black); Local apparent transfer entropy 

positive values only, max. 9.68 bits (black), min. 0.00 bits (white), 
15 bits (black); (g) Local temporal entropy rate, max. 9.68 bits (black), min. 
separable information: (h) positive values only, max. 5.03 bits (black), min. 0.00 bits (white), (i) negative values only, max. 
0.00 bits (white), min. -14.44 bits (black). 



(f) negative values only, max. 
0.00 bits (white); Local 
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to a destination. We have previously described how to 
measure information transfer in complex systems, and 
applied this to CA rules 110 and 18, in [53] ■ Here, we 
summarize this work and discuss interpretations from the 
perspective of computation, present more examples, and 
introduce collective transfer entropy. 



A. Local transfer entropy 

Schreiber presented transfer entropy as a measure for 
information transfer [s^ in order to address deficiencies 
in the previous de facto measure, mutual information 
(Eq. ([4])), the use of which he criticized in this context 
as a symmetric measure of statically shared information. 
Transfer entropy is defined as the deviation from indepen- 
dence (in bits) of the state transition of an information 
destination X from the previous state(s) of an informa- 
tion source Y: 



Ty^x = > »(w„)log2 ttm— ' (20) 



p{Xn+l\x''n'') 



^ Xn T yn )■ 



where w„ is the state transition tuple {xn+i, 
As such, the transfer entropy is a directional, dynamic 
measure of information transfer. It can be viewed as a 
conditional mutual information, casting it as the aver- 
age information in the source about the next state of the 
destination that was not already contained in the desti- 
natio n's p ast. 

In |55| . we demonstrated that the transfer entropy 
metric is an average (or expectation value) of a local 
transfer entropy at each observation n, i.e. Ty^x = 
{ty^xin + 1)) where: 

, / , n 1 pixn+i\xit\y'!!^) , . 

ty^xin+l) = \og2 —(k) — - (21) 

p{Xn+i\x'„ ') 

For lattice systems such as CAs with spatially-ordered 
agents, the local information transfer to agent Xi from 
Xi^j at time n + 1 is represented as: 



distributions used in Eq. ((22)) from all spatiotemporal ob- 
servations, and we write the average across homogeneous 
agents as r(j, k) — {t{i,i, n,k)). 

It is important to note that the destination's own his- 
torical values can indirectly influence it via the source or 
other neighbors: this may be mistaken as an independent 
flow from the source here. In 55] we referred to this self- 
influence as non-traveling information, making analogies 
to standing waves. In the context of distributed computa- 
tion, it is recognizable as the active information storage. 

(k) 

That is, conditioning on the destination's history x^ 
serves to eliminate the active information storage from 
the transfer entropy measurement. Yet any self-influence 
transmitted prior to these k values will not be eliminated: 
in |55l| we generalized comments on the entropy rate in 
[57| to suggest that the asymptote fc — > cx) is most cor- 
rect for agents displaying non-Markovian dynamics. Just 
as the excess entropy and active information storage re- 
quire fc — > oo to capture all information storage, accurate 
measurement of the transfer entropy requires fc ^ oo to 
eliminate all information storage from being mistaken as 
information transfer. The most correct form of the trans- 
fer entropy is therefore computed as: 

( . I C^) . . ^ 
i(i, j,n+ 1) = lim log2 '- ' ' , (23) 

with t{i,j,n+ l,fc) retained for finitc-fc estimates. 

Additionally, the transfer entropy may be conditioned 
on other possible causal information sources, to elimi- 
nate their influence from being attributed to the source 
in question Y (STj . In general, this means conditioning 
on all sources Z in the set of causal information contrib- 
utors V (except for Y) with joint state Vy^n, giving the 
local complete transfer entropy [55| : 



tY^x(n + l,k) = log2 



/ I (fc) \ 
'( Tlfc) 

p(Xn+l\x)i' ,Vy^n) 



(24) 



Vy,n = {zn\^Z eV,Z^Y,X]. (25) 

For EGAs this means conditioning on other sources ^ „ 
in the neighborhood of the destination to obtain [55| : 



( I (fc) (0 \ 

t{^, j,n + l,k,l)= log, ^(^-■»+il^^n'^-..J . (22) t^i^, n + 1, fc) = log^ 



f I (fc) 



piXi^n+llxl'^n) 

Using ^ = 1 is sensible for systems such as CAs where only 
the previous source state is a causal information contrib- 
utor to the destination; in this case we drop I from the 
notation: t(i,j,n+ l,fc). Given our focus on GAs, from 
here onwards we consider only I = 1 in this paper. This 
information transfer t{i, j,n+l, k) to agent Xi from Xi-j 
at time n -f 1 is illustrated in Fig. 8(a) t{i, j, n, fc) is de- 
fined for every spatiotemporal destination (i, n), for every 
information channel or direction j; sensible values for j 
correspond to causal information sources, i.e. for GAs, 
sources within the cell range |j| < r. Again, for homoge- 
neous agents it is appropriate to estimate the probability 



(26) 



f I (fc) r A 

P \ Xi^n+l Fj „ , V^ j^^ I 

{a;i+9,n I Vg : -r < q < +r, q ^ - j, 0} .(27) 



Again, the most correct form is t'^(?, j, n -I- 1) in the limit 
fc — ^ oo. In deterministic systems (e.g. GAs), complete 
conditioning renders t'^{i,j,n) > because the source 
can only add information about the outcome of the des- 
tination. Calculations conditioned on no other informa- 
tion contributors (as in Eq. (HH))) are labeled as appar- 
ent transfer entropy. Local apparent transfer entropy 
t{i,j, n) may be either positive or negative, with negative 
values occurring where (given the destination's history) 
the source element is actually misleading about the next 
state of the destination. 
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(a) Transfer Entropy 



(b) Separable Information 



FIG. 8: (a) Transfer Entropy t{i,j, n + 1, k): information contained in the so urce cell Xi-j about the next state of the destination 
cell Xi at time time n+1 that was not contained in the destination's past, (b) Separable information s(i, n + 1, k): information 
gained about the next state of the destination from separately examining each causal information source in the context of the 
destination's past. For EGAs these causal sources are within the cell range r. 



B. Total information, entropy rate and collective 
information transfer 



The total information required to predict the next 
state of any agent i is the local entropy h{i,n + 1), 
where the entropy is the average of these local values: 
H{Xi) — (h{i,n+ 1)). Similarly, the local temporal en- 
tropy rate h^{i,n + 1, fc) is the information to predict 
the next state of agent i given that agent's past, and 
the entropy rate is the average of these local values: 
Hf^{Xi,k) — (h^{i,n + l,k)). As demonstrated in Ap- 
pendix[Xl the local entropy can be considered as the sum 
of the local active information storage a(i, n + 1, fc) and 
local temporal entropy rate: 



h{i, n + I) ~ a(i, n + 1, fc) + hf^{i, n + 1, fc). 



(28) 



For deterministic systems (e.g. CAs) there is no intrinsic 
uncertainty, so the local temporal entropy rate is equal 
to the local collective transfer entropy (see Appendix [A]) 
and represents a collective information transfer: the in- 
formation about the next state of the destination jointly 
added by the causal information sources that was not 
contained in the past of the destination. Also, Appendix 
1X1 shows that (via a sum of incrementally conditioned 
mutual information terms) for EGAs we have: 

h{i,n + 1) = a(i, n + l,fc) + t{i, j — — l,n + l,fc) + 

t^{i,j = l,n+l,k), (29) 

(and vice- versa in j = 1,-1). 

Clearly, this total information is not simply a simple 
sum of the active information transfer and the appar- 
ent transfer entropy from each source, nor the sum of 



the active information transfer and the complete trans- 
fer entropy from each source. In earlier work [ssj . we 
demonstrated that the sum of transfer entropies from 
each source (either apparent or complete) formed a use- 
ful single spatiotemporal filter for emergent structure in 
CAs, whereas the transfer entropy from each source dis- 
plays information transfer in one given direction only. 
Given Eq. (|28p . we suggest that the local collective trans- 
fer entropy (or simply the local temporal entropy rate 
h^(i, n, fc) for deterministic systems) is likely to be a more 
meaningful measure and filter for this purpose. 



C. Local information transfer results 

Local complete and apparent transfer entropy were ap- 
plied to EGA rules 110 and 18 in [s^. Here we revisit 
these and present further examples, focusing on the local 
apparent transfer entropy: profiles of the positive values 
of t{i, j — 1, n, fc 



16) are plott ed for rules 54 (Fig . 2(e) I, 

rpar (Fig. [5M]), 18 (Fig. [6(f)]) and 22 (Fig. [7(^ with 
t{ij = -1, n, fc = 16) plotted for rule 110 (Fig.|4(e)[). We 



also measure the profiles of the local temporal entropy 
rate h^{i,n + 1, fc) (which is equal to the local collective 
transfer entropy in these deterministic systems) here in 
Fig. |4(g)| for rule 110, Fig. |5(g)| for cjjpar, and Fig. |6(g) 
for rule 18. 

Both the local apparent and complete transfer en- 
tropy highlight particles (including gliders and domain 
walls) as strong positive information transfer against 
background domains. Importantly, the particles are mea- 
sured as information transfer in their direction of macro- 
scopic motion, as expected. For example, at the "x" 
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marks in Fig. [3] which denote parts of the right-moving 
7+ gliders, we ha,ve p{xi^n+i\x['^n^'^\ Xi-i.n) = 1-00 and 
p{xi^n+i\x'f',^^^^) — 0.25: there is a strong information 
transfer of t{i,j — l,n,k — 16) — 2.02 bits here because 
the source (in the ghder) added a significant amount of 
information to the destination about the continuation of 
the ghder. For (f)par we confirm the role of the gliders 
as information transfer agents in the human understand- 
able computation, and demonstrate information trans- 
fer across multiple units of space per unit time step for 
fast-moving gliders in Fig. 5(f) Interestingly, we also 



see in Fig. 5(e) and Fig. 5(f) that the apparent trans- 



fer entropy can attribute information transfer to several 
information sources, whereas the complete transfer en- 
tropy (not shown) is more likely to attribute the transfer 
to the single causal source (though we emphasize that 
information transfer and causality are distinct concepts). 

As expected, the local temporal entropy rate profiles 
hfj_(i,n + l,k) highlight particles moving in each relevant 
channel and are a useful single spatiotemporal filter for 
emergent structure. In fact, these profiles are quite sim- 
ilar to the profiles of the negative values of local active 
information. This is not surprising given they are coun- 
terparts in Eq. (^5)) : where hfj_{i,n + l, k) is strongly pos- 
itive (i.e. greater than 1 bit), it is likely that a(i, n-\-l,k) 
is negative since the local single cell entropy will average 
close to 1 bit for these examples. Unlike a(i,n 4- l,fc) 
however, the local temporal entropy rate hfj_{i,n + 1, k) 
is never negative. 

In [5^ we also demonstrated that while there is zero in- 
formation transfer in an infinite periodic domain, there is 
a small non-zero information transfer in domains acting 
as a background to gliders, effectively indicating the ab- 
sence of gliders. These small non-zero information trans- 
fers are stronger in the wake of a glider, indicating the 
absence of (relatively common) following gliders. Simi- 
larly, we note here that the local temporal entropy rate 
profiles hfj_(i, n -I- 1, fc) contain small but non-zero values 
in these periodic domains. Furthermore, there is interest- 
ing structure to the information transfer in the domain 



of rule 18 (see Fig. 6(g)). As described in Section HVDl 



this domain is of spatial and temporal period 2, with 
every second site being "0" and every other site being 
either a "0" or a "1" . Since the "0" 's at every second 
site are completely predictable (in the absence of domain 
walls) given their past history, h^{i,n + l^k) at these 
points approaches zero bits. On the other hand, at every 
other site h^{i,n + l,k) approaches 1 bit since observa- 
tions of a "0" or a "1" are roughly equally likely with 
the past history indicating this alternate phase of the 
background. This result complements our observations 
in [55] of the local transfer entropies here. As shown for 
t{i,j — l,n + l,k = 16) in Fig. |6(f)[ local apparent trans- 
fer entropy approaches bits at each site in the domain: 
the "0" 's at every second site are completely predictable 
from their pasts, while the alternate sites require both 
neighboring sources to predict the outcome. The local 



complete transfer entropies therefore measure approxi- 
mately 1 bit in their ability to finally determine the sites 
in the alternate phase, but measure no transfer for the 
"0" sites in the primary phase. Summing the respective 
pairs of apparent and complete transfer entropies (as per 



Eq. (HH)) therefore matches our results for /i^(i, n + 



l,fc) 



as required. With these results, local transfer entropy 
provided the first quantitative evidence for the long-held 
conjecture that particles are the dominant (but not the 
only) information transfer agents in CAs. 

The highlighting of structure by local transfer entropy 
is similar to results from other methods of filtering for 
structure in CAs [HI [H, [HI, ill , but subtly difi'erent in 
revealing the leading edges of gliders as the major infor- 
mation transfer elements in the glider structure, and pro- 
viding multiple profiles (one for each direction or channel 
of information transfer). Note that while achieving the 
limit A: ^ 00 is not computationally infeasible, at least 
a significant k was required to achieve a reasonable es- 
timates of the transfer entropy; without this, the active 
information storage was not eliminated from the transfer 
entropy measurements in the domains, and the measure 
did not distinguish the particles from the domains [55| . 

Also, a particularly relevant result for our purposes 
is the finding of negative values of transfer entropy for 
some space-time points in particles moving orthogonal 
to the direction of measurement. This is displayed for 
t{i,j = l,n,k = 16) in rule 54 ( Fig. |2(f)[ ), and <(i,j = 
— l,n, fc = 16) for rule 110 (Fig. 4(f) I, and also occurs 
for rule 18 (results not shown). In general this is be- 
cause the source, as part of the domain, suggests that 
this same domain found in the past of the destination will 
continue; however since the next state of the destination 
forms part of the particle, this suggestion proves to be 
misinformative. For example, consider the "x" marks in 
Fig. [3] which denote parts of the right-moving 7+ gliders. 
If we now examine the source at the right (still in the 

domain), we h.aYe p{xi^n+i\x'*^n^^\xi+i.n) = 0.13 giving 
t(i,j = l,n, fc = 16) = —0.90 bits: this is negative be- 
cause the source (still in the domain) was misinformative 
about the destination. 

Regarding the local information transfer structure of 
rule 22, we note similar results as for local information 
storage. There is much information transfer here (in fact 
the average value T(j = l,fc = 16) = 0.19 bits is greater 
than for rule 110 at 0.07 bits), although there is no coher- 
ent structure to this transfer. Again, this demonstrates 
the utility of local information metrics in providing more 
detailed insights into system dynamics than their global 
averages. 

In this section, we have described how the local trans- 
fer entropy quantifies the information transfer at space- 
time points within a system, and provides evidence that 
particles are the dominant information transfer agents in 
CAs. We have also introduced the collective transfer en- 
tropy to quantify the joint information contribution from 
all causal information contributors, and measured this in 
deterministic systems using the temporal entropy rate. 



18 



However, we have not yet separately identified collision 
events in CAs: to complete our exploration of the infor- 
mation dynamics of computation, we now consider the 
nature of information modification. 



VI. INFORMATION MODIFICATION 

Langton interpreted information modification as in- 
teractions between transmitted and/or stored informa- 
tion which resulted in a modification of one or the other 
0. CAs provide an illustrative example, where the term 
interactions is generally interpreted to mean collisions 
of particles (including blinkers as information storage), 
with the resulting dynamics involving something other 
than the incoming particles continuing unperturbed. The 
resulting dynamics could involve zero or more particles 
(with an annihilation leaving only a background domain), 
and perhaps even some of the incoming particles. The 
number of particles resulting from a collision has been 
studied elsewhere [43^ . Given the focus on perturbations 
in the definition here, it is logical to associate a collision 
event with the modification of transmitted and/or stored 
information, and to see it as an information processing 
or decision event. Indeed, as an information processing 
event the important role of collisions in determining the 
dynamics of the system is widely acknowledged e.g. 
in the (l)par density classification. 

Attempts have previously been made to quantify in- 
formation modification or processing in a system [ijj fisl - 
[TtI . However, these have either been too specific to al- 
low portability across system types (e.g. by focusing on 
the capability of a system to solve a known problem, 
or measuring properties related to the particular type of 
system being examined), focus on general processing as 
movement or interpretation of information rather than 
specifically the modification of information, or are not 
amenable to measuring information modification at local 
space-time points within a distributed system. In this 
section, we present the separable information as a tool to 
detect non-trivial information modification events, and 
demonstrate it as the first measure to identify collisions 
in CAs as such. 



A. Local separable information 

We begin by considering what it means for a particle 
to be modified. For the simple case of a glider, a modifi- 
cation is simply an alteration to the predictable periodic 
pattern of the glider's dynamics. At such points, an ob- 
server would be surprised or misinformed about the next 
state of the glider, having not taken account of the en- 
tity about to perturb it. This interpretation is a clear re- 
minder of our earlier comments that local apparent trans- 
fer entropy t{i,j,n) and local active information storage 
a{i,n) were negative where the respective information 
sources were misinformative about the next state of the 



information destination (in the context of the destina- 
tion's past for transfer entropy) . Local active information 
storage was misinformative at gliders, and local apparent 
transfer entropy was misinformative at gliders traveling 
in the orthogonal direction to the measurement. This 
being said, one expects that the local apparent transfer 
entropy measured in the direction of glider motion will be 
more informative about its evolution than any misinfor- 
mation conveyed from other sources. However, where the 
glider is modified by a collision with another glider, we 
can no longer expect the local apparent transfer entropy 
in its macroscopic direction of motion to remain infor- 
mative about its evolution. Assuming that the incident 
glider is also be perturbed, the local apparent transfer en- 
tropy in its macroscopic direction of motion will also not 
be informative about its evolution at this collision point. 
We expect the same argument to be true for irregular 
particles, or domain walls. 

As such, we make the hypothesis that at the spatiotem- 
poral location of a local information modification event or 
collision, separate inspection of each information source 
will misinform an observer overall about the next state of 
the modified information destination. More specifically, 
the information sources referred to here are the past his- 
tory of the destination (via the local active information 
storage) and each other causal information contributor 
(examined in the context of the past history of the des- 
tination, via their local apparent transfer entropies). 

We quantify the total information gained from separate 
observation of the information storage and information 
transfer contributors as the local separable information 
sx{n): 

sx{n) ^ ax{n) + ^ tY^x{n), (30) 

with the subscripts indicating the destination and source 
variables. Again, the separable information Sx denotes 
the average Sx = (sxin))- For CAs, where the causal 
information contributors are homogeneously within the 
neighborhood r, we write the local separable information 
in lattice notation as: 

+r 

s{i,n) = a{i,n) + t{i^j,n). (31) 

We use s{i, n, k) to represent finite-fc estimates, and show 
s{i, n, k) diagrammatically in Fig. |8(b)[[73t 

As inferred earlier, we expect the local separable infor- 
mation to be positive or highly separable where separate 
observations of the information contributors are informa- 
tive overall regarding the next state of the destination. 
This may be interpreted as a trivial information mod- 
ification, because information storage and transfer are 
not interacting in any significant manner. More impor- 
tantly, we expect the local separable information to be 
negative at spatiotemporal points where an information 
modification event or collision takes place. Here, sepa- 
rate observations are misleading overall because a non- 
trivial information modification is taking place (i.e. the 
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information storage and transfer are interacting. It is 
thus clear how we can understand information modifica- 
tion as the interaction between information storage and 
information transfer. 

Importantly, this formulation of non-trivial informa- 
tion modification aligns with the descriptions of complex 
systems as consisting of (a large number of) elements in- 
teracting in a non-trivial fashion (2^ . and of emergence 
as where "the whole is greater than the sum of its parts ". 
Here, we quantify the sum of the parts in s{i, n), and "the 
whole" refers to examining all information sources to- 
gether; the whole is greater where all information sources 
must be examined together in order to receive positive in- 
formation on the next state of the examined entity. That 
being said, there is no quantity representing "the whole" 
as such, simply the indication that the sources must be 
examined together. We emphasize that s{i,n) is not the 
total information an observer needs to predict the state 
of the destination; this is measured by the single-site en- 
tropy h{i,n) (see Section 

Finally, we introduce the notation S~^{k) and S~{k) 
as the averages of positive and negative local values of 
s{i,n,k) in contributing to the average S{k). We have 
for example S~^{k) = (s"'"(i, n, fc)), where: 



s^{i, n, k) 



s{i,n,k) if s(i, n, /c) > 
if s{i, n,k) <0 



(32) 



while S (fc) = (s (i,n,k)) is defined in the opposite 
manner. 



B. Local separable information results 

The simple gliders in EGA rule 54 give rise to relatively 
simple collisions which we focus on in our discussion here. 
The positive values of s{i,n, k ~ 16) for rule 54 are dis- 
played in Fig. 2(g) notice that these are concentrated 
in the domain regions and at the stationary gliders {a 
and f3). As expected, these regions are undertaking triv- 
ial computations only. Fig. 2(h) displays the negative 
16), with their positions marked in 



valu es of s{i, n, k 
Fig. 2(i) The dominant negative values are clearly con- 



centrated around the areas of collisions between the glid- 
ers, including collisions between the traveling gliders only 
(marked by "A") and between the traveling gliders and 
the stationary gliders (marked by "B", "C" and "D"). 

Collision "A" involves the 7+ and 7" particles inter- 
acting to produce a (3 particle (7+ -|- 7" ^ /? [i3|). 
The only information modification point highlighted is 
one time step below the point at which the gliders 
naively appear to collide (see close-up of raw states in 
Fig. [3]). The periodic pattern in the past of the desti- 
nation breaks there, however the neighboring sources are 
still able to support separate prediction of the state (i.e. 
a{i, n,k= 16) = -1.09 bits, t{i, j = 1, n, fc = 16) = 2.02 
bits and t{i,j = — l,n, fc — 16) — 2.02 bits, giving 
s{i,n,k = 16) = 2.95 bits). This is no longer the case 



however where our metric has successfully identified the 
modification point; there we have a{i, n,k — 16) — —3.00 
bits, t{i,j = 1,71, fc = 16) = 0.91 bits and t{i,j — 
-l,n, fc = 16) = 0.90 bits, with s{i,n,k = 16) = -1.19 
bits suggesting a non-trivial information modification. A 
delay is also observed before the identified information 
modification points of collision types "B" (7+ + /? — + 7", 
or vice- versa in 7-types), "C" (7" + a ^ + a + 27+, 
or vice- versa) and "D" (27+ -I- a -I- 27^ a); possibly 
these delays represent a time-lag of information process- 
ing. Not surprisingly, the results for these other collision 
types imply that the information modification points are 
associated with the creation of new behavior: in "B" and 
"C" these occur along the newly created 7 gliders, and 
for "C" and "D" in the new a blinkers. 

Importantly, weaker information modification points 
continue to be identified at every second point along all 
the 7"^ and 7" particles after the initial collisions (these 
are too weak to appear in Fig. |2(h)| but can be seen for 
a similar glider in rule 110 in Fig. |4(i) I. This was un- 
expected from our earlier hypothesis. However, these 
events can be understood as non-trivial computations of 
the continuation of the glider in the absence of a colli- 
sion; in effect they are virtual collisions between the real 
glider and the absence of an incident glider. These weak 
collision events are more significant in the wake of real 
collisions, since incident gliders are relatively more likely 
in these areas. Interestingly, this finding is analogous to 
the small but non-zero information transfer in periodic 
domains indicating the absence of gliders. 

We also note that measurements of local independently 
observed information must be performed with a reason- 
ably large value of fc. Earlier, we observed that for appro- 
priate measurement of information storage and transfer 
k should be selected to be as large as possible for accu- 
racy, at least larger than the scale of the period of the 
regular background domain for filtering purposes. Here, 
using fc < 4 could not distinguish any information mod- 
ification points clearly from the domains and particles, 
and even fc < 8 could not distinguish all the modifica- 
tion points (results not shown). Correct quantification of 
information modification requires satisfactory estimates 
of information storage and transfer, and accurate distinc- 
tion between the two. 

We observe similar results in the profile of s(i,n, fc = 
10) for (t)par, confirming that the particle collisions here 
as non-trivial information modification events, and there- 
fore completing the evidence for all of the conjectures 
about this human understandable computation. 

The results for s{i,n,k = 16) for ECA rule 110 (see 
Fig. |4(h)| and Fig. [4(i)] ) are also similar to those for rule 
54. Here, we have collisions "A" and "B" which show 
non-trivial information modification points slightly de- 
layed from the collision in a similar fashion to those for 
rule 54. We note that collisions between some of the 
more complex glider structures in rule 110 (not shown) 
exhibit non-trivial information modification points which 
are more difficult to interpret, and which are even more 
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delayed from the initiation of the coUision. The larger 
delay is perhaps this is a reflection of the more complex 
gliders requiring more time steps for the processing to 
take place. An interesting result not seen for rule 54 is a 
collision where an incident glider is absorbed by a blinker 
(see label "C" ) , without any modification to the absorb- 
ing glider. No information modification is detected for 
this absorption event by s{i,n,k = 16): this is as ex- 
pected because the information storage for the absorbing 
blinker is sufficient to predict the dynamics at this inter- 
action. 

As a further test of the measure, we examine collisions 
between the domain walls of rule 18. As displayed in 
Fig. 6(i) the collision between the domain walls is quite 



clearly highlighted as the dominant information modifi- 
cation event for this rule. The initial information modi- 
fication event is clearly where one would naively identify 
the collision point, yet it is followed by two secondary 
information modification points separated by two time 
step s. At the raw states of these three collision points in 
Fig. 6(a) the outer domains have effectively coalesced, 



however the observer cannot be certain that the new do- 
main has taken hold at this particular cell until observing 
a "1" at the alternate phase. As such, information mod- 
ification events are observed at each point in the new 
alternate phase until a "1" confirms the outer domains 
have joined. This could provide a parallel to the obser- 
vation of delays in information processing observed ear- 
lier. Importantly, this result provides evidence that col- 
lision of irregular particles are information modification 
events, as expected. It is also worth noting that these 
collisions always result in the destruction of the domain 
walls (and the inner domain) , indicating that our method 
captures destruction events as well as creation. (This is 
also true for the 7"*" -I- 7~ -I- /3 — > event in rule 54, not 
shown). Also, as displayed in Fig. 6(h) and Fig. 6(i 
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the background domain takes values of s(i, n, k = 
as either positive or negative with a{i,n,k = 16), since 
t{i,j = l,n,k — 16) and t{i,j — —l,n,k — 16) vanish 
at these points. This indicates that some minor infor- 
mation processing is required to compute the "0" sites 
in the alternate phase (whereas the "0" sites for every 
second point and the "l"'s in the alternate phase are 
trivial computations dominated by information storage). 
Finally, the domain walls here appear to give rise to only 
positive values of s{i,n, k — 16). This indicates that the 
domains walls contain only trivial information modifica- 
tion, in contrast with the gliders in rule 54 which required 
a small amount of non-trivial information processing in 
order to compute their continuation. This is perhaps 
akin to the observation in [21] that the domain walls in 
rule 146 are largely determined by the dynamics on either 
side, i.e. they are not the result of any interaction per se 
but of dominance from a single source at each time step. 

We also apply s{i,n, k = 16) to EGA rule 22, as dis- 
played in Fig. 7(i) and Fig. 7(i) As could be expected 



from our earlier results, there are many points of both 
positive and negative local separable information here. 



The presence of negative values implies the occurance of 
non-trivial information modification, yet there does not 
appear to be any structure to these profiles. Again, this 
aligns well with the lack of coherent structure found us- 
ing the other measures in this framework and from the 
local statistical complexity profile of rule 22 [2l[. 

Here, we have introduced the local separable informa- 
tion to quantify information modification at each spa- 
tiotemporal point in a complex system. Information 
modification events occur where the separable informa- 
tion is negative, indicating that separate or independent 
inspection of the causal information sources (in the con- 
text of the destination's past) is misleading because of 
non-trivial interaction between these sources. The lo- 
cal separable information was demonstrated to provide 
the first quantitative evidence that particle collisions in 
CAs are the dominant information modification events 
therein. The measure is capable of identifying events in- 
volving both creation and destruction, and interestingly 
the location of an information modification event often 
appears delayed perhaps due to a time-lag in information 
processing. 



VII. IMPORTANCE OF COHERENT 
COMPUTATION 

Our framework has proven successful in locally iden- 
tifying the component operations of distributed compu- 
tation. We now consider whether this framework can 
provide any insights into the overall complexity of com- 
putation. In other words, what can our results say about 
the difference in the complex computations of rules 110 
and 54 as compared to rule 22 and others? 

We observe that the coherence of local computational 
structure appears to be the most significant differentiator 
here. "Coherence" implies a property of sticking together 
or a logical relationship [5^: in this context we use the 
term to describe a logical spatiotemporal relationship be- 
tween values in local information dynamics profiles. For 
example, the manner in which particles give rise to simi- 
lar values of local transfer entropy amongst spatiotempo- 
ral neighbors is coherent. From the spatiotemporal pro- 
files presented here, we note that rules 54 and 110 exhibit 
the largest amount of coherent computational structure, 
with rule 18 containing a smaller amount of less coher- 
ent structure. Rules 22 and 30 (results for rule 30 not 
shown) certainly exhibit all of the elementary functions 
of computation, but do not appear to contain any co- 
herent structure to their computations. This aligns well 
with similar explorations of local information structure 
for these rules, e.g. (2l|. Using language reminiscent of 
Langton's analysis Q, we suggest that complex systems 
exhibit very highly- structured coherent computation in 
comparison to ordered systems (which exhibit coherence 
but minimal structure in a computation dominated by 
information storage) and chaotic systems (whose compu- 
tations are dominated by rampant information transfer 
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eroding any coherence). 

The key question then is whether one can perform any 
meta-analysis on the local information dynamics in order 
to quantify the differences in complexity of these compu- 
tations in terms of coherent structure. It is unlikely that 
such meta-analyses will produce system-wide measures 
of the complexity of computation (see comments in Sec- 
tion |TTTB]), however we attempt to at least produce useful 
heuristics for this purpose. Here, we present three can- 
didate approaches: examining the average information 
dynamics, correlation analyses and state-space plots of 
the local values. 

An obvious first step is to check whether the average 
information dynamics provide useful summaries regard- 
ing the coherence and complexity of computation in each 
CA rule (despite the fact that the local values themselves 
provide far more detail). These averages are presented 
in Tabic |T1 One striking feature of the known complex 
rules is that the apparent transfer entropy in each chan- 
nel is a large proportion of the complete transfer entropy 
for that channel. Apparent transfer entropy can only 
be high where the source has a clear coherent influence 
on the destination, while complete transfer entropy can 
separately be high where the influence of the source is 
mediated through an interaction. In the complex CAs, 
single sources can influence destinations without needing 
to interact with other sources, supporting the movement 
of coherent particle structures. Importantly, this occurs 
for multiple channels, meaning that we have bidirectional 
traveling coherent structures that should interact at some 
point. A similar feature is that their separable informa- 
tion approaches the entropy (again indicating dominance 
of single sources), along with a very low proportion of 
non-trivial information modification events (indicated by 
an almost vanishing S~ and a small proportion of points 
with s{i, n) < 0). Given our knowledge of the importance 
of these events to computation, their shortage in complex 
computation initially seems counter-intuitive. However, 
we suggest that the power of these events lies in their 
subtlety: used judiciously they allow a complex coherent 
computation, but occurring too often they disturb the co- 
herence of the computation which then becomes chaotic. 
Similarly, we note that chaotic rules exhibit higher values 
of the complete transfer entropy along with lower values 
of the apparent transfer entropy; this provides another 
indication of significant interaction between components 
eroding the coherent computation in this regime. While 
these observations quantify neither coherence nor com- 
plexity of computation, they do provide useful heuristics 
for identifying those properties. 

Our interpretation of coherence as meaning a logical 
spatiotcmporal relationship between local values suggests 
that it may be measured via the autocorrelation within 
profiles of each of their local information dynamics. Ta- 
ble |TT] shows for example the autocorrelation for local 
transfer entropy values t{i,j = l,n,k = 16) separated 
by 1 step in time and 1 step to the right in space. The 
separation for the autocorrelation here is the same as the 



interval across which the local transfer entropy is mea- 
sured. Notice that the rules exhibiting particles (110, 54 
and 18) display the highest correlation values here, since 
coherent particles exhibit spatiotcmporal correlation in 
the direction of particle motion. Similar results are ob- 
served for t{i,j = —l,n,k = 16) with autocorrelation 
over 1 step in time and 1 step to the left in space. Again, 
this observation is a useful heuristic, and parallels the 
above observations regarding the proportion of average 
transfer entropy values. 

Coherence may also be interpreted as a logical relation- 
ship between profiles of the individual local information 
dynamics (as three axes of complexity) rather than only 
within them. To investigate this possibihty. Fig. [5] plots 
state-space diagrams of the local apparent transfer en- 
tropy for j — 1 versus local active information storage, 
while Fig. [10] plots the local separable information ver- 
sus local active information storage for several CA rules. 
Each point in these diagrams represents the local val- 
ues of each measure at one spatiotemporal point, thereby 
generating a complete state-space for the CA. Such state- 
space diagrams are known to provide insights into struc- 
ture that are not visible when examining either metric in 
isolation; for example, in examining structure in classes 
of systems (such as logistic maps) by plotting average ex- 
cess entropy versus entropy rate while changing a system 
parameter j59| . Here however we are looking at struc- 
ture within a single system rather than across a class of 
systems. 



The state-space diagrams for rule 110 (Fig. 9(a) and 



Fig. 10(a)) exhibit interesting structure, with significant 



clustering around certain areas and lines in the state 
space, reflecting its status as a complex rule. (The two 
diagonal lines are upper limits representing the boundary 
condition t'^{i,j — —l,n,k — 16) > for both destina- 
tion states "0" and "1"). On the other hand, the example 
state space diagrams for rule 30 (Fig. [9(b)] and Fig. [TO(S)| ) 
exhibit minimal structure (apart from the mathemati- 
cal upper limit), with a smooth spread of points across 
the space reflecting its underlying chaotic nature. From 
the apparent absence of coherent structure in its space- 
time information profiles, one may expect state-space di- 
agrams for rule 22 to exhibit a similar absence of struc- 
ture to rule 30. As shown by Fig. 9(c) and Fig. 10(c 



however this is not the case: the state-space diagrams for 
rule 22 exhibit significant structure, with similar cluster- 
ing to that of rule 110. 

To attempt to quantify the coherence of the struc- 
ture here, we measure the correlation coefficient between 
the values of t{i,j = l,n, fc = 16) and a(i,n,k = 16) 
for example (see Table [Hi. However, the correlation 
measures linear relationships alone; we also measure the 
mutual information 74j| (see Table ITT)) between the pairs 
{t{i,j — l,n,k — 16),a(i,rt,fc — 16)) and {s{i,n,k — 
I6),a{i,n,k — 16)) as a more general measure of their 
underlying relationship. The mutual information re- 
sults suggest (as expected) that rules 110 and 54 display 
strong relationships between all measures, with signifi- 
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TABLE L Table of average information dynamics (all with k — 16, values to 2 decimal places except S for rule 110) with units 
in bits (except for p{s{i,n) < 0), the proportion of space-time points with negative local separable information), for several 
EGA rules. 



EGA rule 


H 


A 










S 


S+ 


S' 


p{s{i,n) < 0) 


110 


0.99 


0.18 0.81 


0.07 


0.11 


0.07 


0.11 


0.98 


0.98 


-0.002 


0.003 


54 


1.00 


0.27 0.73 


0.08 


0.08 


0.19 


0.19 


0.89 


0.90 


-0.01 


0.03 


22 


0.93 


0.75 0.19 


0.19 


0.19 


0.56 


0.56 


0.56 


0.62 


-0.05 


0.09 


18 


0.82 


0.53 0.29 


0.01 


0.01 


0.52 


0.52 


0.32 


0.46 


-0.14 


0.25 


30 


1.00 


0.99 0.01 


0.73 


0.01 


0.98 


0.26 


0.75 


0.82 


-0.07 


0.08 



TABLE IL Table of the autocorrelation between values of lo- 
cal transfer entropy separated by 1 step in time and space (t — 
t{i,j = 1, n, fc = 16), t' = t{i + l,j = 1, n -|- 1, fc = 16)), cor- 
relation coefficient and mutual information (in bits) between 
values of local active information storage (a = a{i, n,k = 16)) 
and local transfer entropy (t) at the same space-time points, 
and mutual information (in bits) between values of local sep- 
arable information (s = s{i, n,k = 16)) and local active infor- 
mation storage (a) for several EGA rules. 



EGA rule 


Ctt' 


Cta 






110 


0.19 


-0.57 


0.35 


0.69 


54 


0.45 


-0.54 


0.58 


0.57 


22 


0.09 


-0.28 


0.25 


0.40 


18 


0.44 


-0.23 


0.09 


1.48 


30 


0.03 


-0.19 


0.17 


0.00 



cant strength in these relationships for rule 22 also, and 
a strong relationship for (s(i,n, fc = 16),a(i,n, fc = 16)) 
in rule 18 (as these are highly correlated in the period-2 
pattern in its domain). Again, this appears to be a use- 
ful heuristic for coherence in computation, though the 
state-space diagrams themselves contains much more de- 
tail about the relationships between these axes of com- 
plexity. 

Importantly, the apparent information structure in the 
state-space diagrams lends some credence to the claims 
of complex behavior for rule 22 discussed in Section llll CI 
However it is a very subtle type of structure, not com- 
plex enough to be revealed in the individual local in- 
formation profiles shown here or by other authors (e.g. 
[21]). The structure does not appear to be coherent in 
these individual profiles, though the state space diagrams 
indicate a coherent relationship between the local infor- 
mation dynamics which may underpin coherent compu- 
tation at other scales. Given the subtlety of structure 
in the bounds of our analysis, and using our mutual in- 
formation heuristics, at this stage we conclude that the 
behavior of this rule is less complex than that exhibited 
by rules 110 and 54. 



Here we have suggested that coherent information 
structure is a defining feature of complex computation, 
and presented a number of important techniques and 
heuristics for inferring this property using local infor- 
mation dynamics. A particular example are state-space 
diagrams for local information dynamics, which produce 
useful visual results and were shown to provide interest- 
ing insight into the nature of computation in rule 22. 



VIII. CONCLUSION 

We have presented a complete quantitative framework 
for the information dynamics of distributed computation 
in complex systems. Our framework quantifies the infor- 
mation dynamics in terms of the component operations 
of universal computation: information storage, informa- 
tion transfer and information modification. Importantly, 
the framework describes the manner in which information 
storage and transfer interact to produce non-trivial com- 
putation where "the whole is greater than the sum of the 
parts" . Our framework places particular importance on 
examining computation on a local scale. While averaged 
or system-wide measures have their place in providing 
summarized results, this focus on the local scale is vital 
for understanding the information dynamics of computa- 
tion and provides many insights that averaged measures 
cannot. 

We applied the framework to cellular automata, an im- 
portant example because of the weight of previous stud- 
ies on the nature of distributed computation in these 
systems. Significantly, our framework provides quanti- 
tative evidence for the widely accepted conjectures that 
blinkers provide information storage in CAs, particles 
are the dominant information transfer agents, and par- 
ticle collisions are the dominant information modifica- 
tion events. In particular, this was demonstrated for 
the human-understandable density classification compu- 
tation carried out by the rule (j)par- This is a funda- 
mental contribution to our understanding of the nature 
of distributed computation, and provides impetus for the 
framework to be used for the analysis and design of other 
complex systems. 
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FIG. 9: State space diagrams of local transfer entropy (one 
step to the right) t{i,j = l,n,k — 16) versus local active 
information a{i, n,k = 16) at the same space-time point (i, n) 
for several EGA rules: f(a)lllO, f(b)l30 and[(c)l22. 



FIG. 10: State space diagrams of local separable information 
s{i,n,k — 16) versus local active information a{i,n,k — 16) 
at the same space-time point (i, n) for several EGA rules 



110, (b) 30 and (c) 22 
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The application to CAs aligned well with other meth- 
ods of filtering for complex structure in CAs. However, 
our work is distinct in that it provides several different 
views of the system corresponding to each type of com- 
putational structure. In particular, the results align well 
with the insights of computational mechanics, providing 
a strong connection between this field and the local in- 
formation dynamics of computation. 

From our results, we also observed that coherent lo- 
cal information structure is a defining feature of com- 
plex distributed computation, and presented a number 
of techniques to meta-analyze local information dynam- 
ics in order to infer coherent complex computation. Here, 
our framework provides further insight into the nature 
of computation in rule 22 with respect to the accepted 
complex rules 54 and 110. Certainly rule 22 exhibits 
all of the elementary functions of computation, yet (in 
line with there is no apparent coherent structure to 
the profiles of its local information dynamics. On the 
other hand, state space views of the interplay between 
these local information dynamics reveal otherwise hid- 
den structure. Our framework is unique in its ability to 
resolve both of these aspects. We conclude that rule 22 
exhibits more structure than chaotic rules, yet the sub- 
tlety of this structure prevent it from being considered 
as complex than rules 110 and 54. 

The major thrust of our future work is to apply this 
framework to other systems, e.g. we are examining com- 
putation in random Boolean networks as models of gen e 
regulatory networks [60] and in modular robotics [6l|. 
Given the information-theoretic basis of this framework, 
it is readily applicable to other systems. Furthermore, 
we intend to explore the relationship between the infor- 
mation dynamics description of distributed computation 
and other perspectives of computation, for example de- 
scriptions of collective computation (e.g. in the light-cone 
model of computational mechanics [2l|) and of computa- 
tional complexity |62l] . 
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APPENDIX A: TOTAL INFORMATION 
COMPOSITION 

Here, we demonstrate the mathematical arrangement 
of information storage and transfer for prediction of 
the next state at any given space-time point in the 
system, [rsj 



First, note that the average information required to 
predict the state at any spatiotemporal point is simply 
the single cell entropy Hx (Eq. ([1])). We use the mutual 
information expansion of Eq. ([5]) to express this entropy 
in terms of the active information storage and entropy 
rate estimates [76j: 

Hx' = Ix'-xw + Hx>\x(i') ■ (Al) 

For convenience, we switch to local notation with the 
local entropy and local temporal entropy rate estimate 
represented as ft. (n-|-l) and /i^(n-|-l, fc) = h{xi^n+i\xi+r,n) 
respectively, and as h{i,n+l) and /i^(«, n -f- 1, /c) in local 
lattice notation: 

h{n + 1) = a{n + 1, k) + hf^(n + 1, k), (A2) 
h{i, n + 1) = a(i, n + 1, k) + h^{i, n + 1, fc). (A3) 

Logically, we can restate this as: the information to pre- 
dict a given cell site is the amount predictable from its 
past (the active memory) plus the remaining uncertainty 
after examining this memory. 

We then consider the composition of this remaining un- 
certainty, tailoring our notation to CAs. In doing so, we 
alter Eq. (|27p to represent a consecutive group of neigh- 
bors: 

= {^^+,.n |Vg : s < q < /, g ^ 0} . (A4) 

We systematically expand the entropy rate estimate 
term in Eq. (|A3|) by incrementally taking account of 
the contribution of each other information source. As 
a first step we identify the contribution from source 
i + r, the local apparent transfer entropy t{i, — r, n+l,k) 
represented as a local conditional mutual information 

m{Xi+r,n; Xi^n+l\x'l'^n) (sCe 

h{i,n+ 1) = a{i,n+ l,fc) + m{xi+r,n]Xi^„+i\x^^l) + 

h(Xi^n+l\xi+r,m x\^^) . (-^5) 

The systemic expansion is performed by using Eq. ([7]) 
iteratively on the rightmost conditional entropy term to 
produce: 

/i(«, n -\- 1) = a(i, n + l,k) + 

+r 

El I -j+l,r [k)-, 

ft(a;^,n+lk,>^a;,>)■ (A6) 

The sum of incrementally conditioned mutual informa- 
tion terms is a sum of transfer entropies from each ele- 
ment, incrementally conditioned on the previously con- 
sidered sources in the neighborhood. The first such term 
in Eq. (|A5P is an apparent transfer entropy, and the last 
term of the sum in Eq. (jA6[) is a complete transfer en- 
tropy. The order in which these causal information con- 
tributors are removed is arbitrary; the form of the result 
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will be the same. This sum is a collective transfer entropy 
from the set of all causal sources V to the destination: 
Iv-X'\xw 1 the average information about the next 
state of the destination jointly added by the causal infor- 
mation sources that was not contained in the past of the 
destination. In local lattice notation, we have the local 
collective transfer entropy t{i,n + l,k): 



t{i,n+l,k) = ^ m{xi+j^ri;xi^n+i\v. 

j=-r 



logs 



I (k) —r.r 



(A8) 



As before, t{i,n + 1) represents the most correct form in 
the limit k —* oo. 

Also, we note that the final term in Eq. (jA6[) is the re- 
maining local intrinsic uncertainty in the destination af- 
ter its past and all causal information contributors have 
been considered; we label this as u{i, n -I- 1) (noting that 
it is independent of k). As such, Eq. (|A6p logically dis- 
plays the information to predict the destination at any 



space-time point as the sum of the amount predictable 
from its past (active information storage), the amount 
then collectively predictable from its causal contributors 
(collective transfer entropy), and remaining intrinsic un- 
certainty: 

h{i, n + 1) = a{i, n + l,k) + t{i, n -f 1, fc) + 

u{i,n+l). (A9) 

In a deterministic system, note that there is no remaining 
intrinsic uncertainty u{i,n+ 1). As such, for determin- 
istic systems (such as CAs) the temporal entropy rate 
h^(i,n + 1, fc) is equal to the collective transfer entropy 
t(i, n+ l,k). 

As an example, the total information to predict the 
next state of any destination in an EGA can be repre- 
sented as: 



h(i, n + 1) = aii, n -\- \^k) + t[i, j — 
fiij = 1,71 + 1, A:), 



— 1, n 



l,fc) + 
(AlO) 



or vice- versa m j 



-1,1. 
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